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1 Introduction 



Multiple hypothesis testing is a fundamental problem in the modern research for 
high dimensional inference, with wide applications in scientific fields, such as biology, 
medicine, genetics, neuroscience, economics and finance. For example, in genome- 
wide association studies, massive amount of genomic data (e.g. SNPs, eQTLs) are 
collected and tens of thousands of hypotheses are tested simultaneously to find if 
any of these genes are associated with some observable traits (e.g. blood pressure, 
weight, some disease); in finance, thousands of tests are performed to see which fund 
managers have winning ability (Barras, Scaillet & Wermers 2010) 

False Discovery Rate (FDR) has been introduced in the celebrated paper by Ben- 
jamini & Hochberg (1995) for large scale multiple testing. By definition, FDR is the 
expected proportion of falsely rejected null hypotheses among all of the rejected null 
hypotheses. The classification of tested hypotheses can be summarized in Table 1: 



Table 1: Classification of tested hypotheses 





Number 


Number 




Number of 


not rejected 


rejected 




True Null 


U 


V 


Po 


False Null 


T 


S 


Pi 




p-R 


R 


P 



Various testing procedures have been developed for controUing FDR, among which 
there are two major approaches. One is to compare the ordered P-values respectively 
with a sequence of threshold values (Benjamini & Hochberg 1995). Specifically, let 
< P(2) < • • • < P{p) be the ordered observed P-values of p hypotheses. Define k — 
max|i : < ia/pj and reject H^i), ■ ■ ■ , H^k)^ where a is a specified control rate. If 
no such i exists, reject no hypothesis. The other related approach is to fix a threshold 
value and reject the hypotheses with P-values no greater than this threshold (Storey 
2002). The equivalence between the two methods has been theoretically studied by 
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Storey, Taylor & Siegmund (2004) and Ferreira & Zwinderman (2006). Finding such 
a common threshold is based on a conservative estimate of FDR. Specifically, let 

FDR(t) = %t/{R{t) V 1), where R{t) = i^{Pi : Pi < t} is the number of total 
discoveries with the threshold t and po is an estimate of po- Then solve t such that 
FDR(t) < a where a is a predetermined control rate, say 15%. 

Both procedures have been shown to perform well for independent test statis- 
tics. However, in practice, test statistics are usually correlated. Although Clarke & 
Hall (2009) argued that when the null distribution of test statistics satisfies some 
conditions, dependence case in the multiple testing is asymptotically the same as 
independence case, multiple testing under general dependence structures is still a 
very challenging and important open problem. Efron (2007) noted that correlation 
must be accounted for in deciding which null hypotheses are significant because the 
accuracy of false discovery rate techniques will be compromised in high correlation 
situations. There are several literatures to show that Benjamini-Hochberg procedure 
or Storey's procedure can control FDR under some special dependence structures, e.g. 
Positive Regression Dependence on Subsets (Benjamini & Yekutieli 2001) and weak 
dependence (Storey, Taylor & Siegmund 2004). Sarkar (2002) also shows that FDR 
can be controlled by a generalized stepwise multiple testing procedure under positive 
regression dependence on subsets. However, even if the procedures are valid under 
these special dependence structures, they will still suffer from efficiency loss without 
considering the actual dependence information. In other words, there are universal 
upper bounds for a given class of covariance matrices. 

In the current paper, we will develop a procedure for high dimensional multiple 
testing which can deal with any arbitrary dependence structure and fully incorporate 
the covariance information. This is in contrast with Sun & Cai (2009) who developed 
a multiple testing procedure under a hidden Markov model and Leek & Storey (2008) 
and Friguet, Kloareg & Causeur (2009) where the factor models are imposed. More 
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specifically, consider the test statistics 

where S is known and p is large. We would like to simultaneously test Hoi : /ij = 
vs Hii : fii 7^ for i = I,-- - ,p. Note that S can be any non- negative definite 
matrix. Our procedure is called Principal Factor Approximation (PEA). The basic 
idea is to first take out the principal factors that derive the strong dependence among 
observed data Zi, - ■ ■ ,Zp and to account for such dependence in FDP calculation. 
This is accomplished by the spectral decomposition of S and taking out the largest 
common factors so that the remaining dependence is weak. We then derive the 
theoretical distribution of false discovery proportion V/R when p is large that accounts 
for the strong dependence. The realized but unobserved principal factors that derive 
the strong dependence are then consistently estimated. We will further discuss the 
application of our result in multiple testing. 

The motivation for this problem setup comes from genome-wide association stud- 
ies. We are especially interested in the high dimensional sparse problem, that is, p is 
very large, but the number oi /li ^ is very small. In section 2, we will further explain 
why E is known in practice. Sections 3 and 4 present the theoretical results and the 
proposed procedures. In section 5, the performance of our procedures is critically 
evaluated by various simulation studies. Section 6 is about the real data analysis. All 
the proofs are relegated to the Appendix. 

2 Motivation of the Study 

In genome-wide association studies, consider p SNP genotype data for n individual 
samples, and further suppose that a response of interest (i.e. gene expression level 
or a measure of phenotype such as blood pressure or weight) is recorded for each 
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sample. The SNP data are conventionally stored in an n x p matrix X = (xij), with 
rows corresponding to individual samples and columns corresponding to individual 

SNPs . The total number n of samples is in the order of hundreds, and the number 
p of SNPs is in the order of tens of thousands. 

Let Xj and Y denote, respectively, the random variables that correspond to the jth 
SNP coding and the phenotype. The biological question of the association between 
genotype and phenotype can be restated as a problem in multiple hypothesis testing, 
i.e., the simultaneous tests for each SNP j of the null hypothesis Hj of no association 
between the SNP Xj and Y. Consider the marginal linear regression between Y and 
Xf 

mm E{Y - a, -b,X,f, j = !,•••, p. (1) 

aj,bj 

Let aj and /3j be the solution to (1). We wish to simultaneously test the hypotheses 
Hoj: Pj^O vs H.j: ^,-^0, J = (2) 

to see which SNPs are correlated with the phenotype. 

Recently statisticians have increasing interests in the high dimensional sparse 
problem: although the number of hypotheses to be tested is large, the number of 
false nulls {(3j ^ 0) is very small. For example, among the 2000 SNPs there are 
maybe only 10 SNPs which contribute to the variation in phenotypes or certain gene 
expression level. Our purpose is to find out these 10 SNPs by multiple testing with 
some statistical accuracy. 

Because of the correlations among Xi, - ■ ■ , Xp, based on a random sample of size 
n, the least-squares estimators for in (1) are also correlated. The fol- 

lowing result describes the joint distribution of {Pj}^=i. The proof is straightforward. 

Proposition 1. Let f3j he the least-squares estimator for f3j in (1) based on n data 
points, pki be the sample correlation between X^ and Xi, and Uk be the sample standard 
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deviation for X^. Assume that the conditional distribution ofY given X\ , • • • , Xp is 

N{fi{Xi, • • • , Xp), a^). Then, conditioning on {Xij} the joint distribution of 

is ,/3p)"^ ~ iV((/3i, • • ■ ,/3p)^,'E*), where the {k,l)th element in S* is = 

For ease of notation, let Zi,--- ,Zp be the standardized random variables of 
• • • , /3p, that is, 

i = p. (3) 



SD(A) (^/iV^^i) ' 

Then, conditioning on {Xij}, 

{Z,,--- ,Zpfr~.N{{fx,,--- (4) 

where fii = s/nf^^iju and covariance matrix S has the (A;, /)th element as 'pu- Si- 
multaneously testing (2) based on • • • is thus equivalent to testing 



ifoj : A«j = vs i^ij : /ij 7^ 0, j = l,...,p (5) 

based on (Zi, • • • , Z^^ . 

In (4), S is the population covariance matrix of {Z\, • • • , Z^^ , and is known. The 
covariance matrix S can have arbitrary dependence structure. Even if the population 
correlation matrix of the SNP data has certain dependence structure, S can still be 
quite different because the sample size n is relatively small. 



3 Estimating False Discovery Proportion 

Prom now on assume that among all the p null hypotheses, po of them are true and pi 
hypotheses (pi = p — Po) are false, and pi is supposed to be very small compared to p. 
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For a fixed rejection threshold t, we will reject those P-values no greater than t and 
select them as significance. Because of its powerful applicability, this procedure has 
been widely adopted by many statisticians. See Storey (2002), Efron (2007, 2010), 
among others. Our goal is to find a common threshold t such that the decision rule has 
nice statistical properties in multiple testing problem (|5| based on the observations 
(lij) under arbitrary dependence structure of S. 



3.1 Approximation of FDP 

Define the following empirical processes: 

V{t) = i^{true null Pi : Pi < t}, 

S{t) = i^ifalse null Pi : Pi < t} and 

R{t) = i^{Pr.P^<t}, 

where t G [0, 1]. V{t), S{t) and R{t) are the number of false discoveries, the number of 
true discoveries, and the number of total discoveries, respectively. Obviously, R(t) = 
V(t) + S{t), and V{t), S(t) and R{t) are all random variables, due to the randomness 
of the test statistics {Zi, ■ ■ ■ , Zp)'^. Moreover, R{t) is observed given some threshold 
value t in an experiment, but V{t) and S{t) are both unobserved. 

By definition, FDP(t) = V{t)/R{t) and FDR(t) = E V{t)/R(t) . The goal is 
to control FDR()f:) at a predetermined rate a, say 15%. There are also substantial 
research interests in the statistical behavior of the number of false discoveries V{t) 
and the false discovery proportion V{t)/R{t)^ which are unknown but realized. 

We will explore the distribution of V{t)/ R{t) for the high dimensional sparse case 
Pi ^ p. Suppose (Zi, ■ ■ ■ , Zpy ~ N{{fii, ■ ■ ■ , fip)^, S). The covariance matrix S has 
the {k, l)th element as pki with pkk = 1 so that it is a correlation matrix. S can be 
any non-negative definite matrix. Our setting encompasses the problem in Section 
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2. Before we introduce our procedure to deal with the arbitrary dependence case, 
we will give the following definition for weakly dependent normal random variables, 
which is fundamental to our method. 

Definition 1. Suppose {Ki, • • • , Kp)'^ ~ N{{9i, • • • , Op)'^, A). Then Ki, ■ ■ ■ ,Kp are 

called weakly dependent normal variables if 

lim p~'^y^\aij\ = 0, (6) 

where aij denote the {i,j)th element of covariance matrix A. 

Our procedure is called principal factor approximation (PFA). The basic idea is 
that any {Zi, ■ ■ ■ , Zp)^ ~ N{{jii, ■ ■ ■ , fip)^ , S) can be decomposed as a factor model 
with weakly dependent normal random errors. The details are shown as follows. 
Firstly apply the spectral decomposition to the covariance matrix S. Suppose the 
eigenvalues of S are Ai, • • • ,Xp, which have been arranged in decreasing order. If the 
corresponding orthonormal eigenvectors are denoted as 7^, • • • ,7^, then 

S = X^A,7.7r (7) 

i=l 

If we further denote A — Yl^=k+i '^i'lill where k is some well-chosen integer value, 
then 

l|A||| = Ati + --. + Aj, (8) 

where || • \\f is the Frobenius norm. Let L = {VM.1i, ■ ■ ■ , V^lk)^ which is p x k 
dimensional. Then the covariance matrix S can be expressed as 

S = LL^ + A, (9) 
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and Zi, - ■ ■ , Zp can be written as 

k 

Zi= fii + J2^ihWh + Ki, i = l,---,p, (10) 



h=l 



where (fty, ■ ■ ■ , bpj) = yJXj'^^, the factors are Wh ~ A^(0, 1) and the random errors 
are (i^i,--- ,Kp)'^ ~ A^(0,A). Furthermore, Wi,--- ,Wk are independent of each 



other and independent of Ki,--- ,Kp. In expression (10), {/ij = 0} correspond to 



the true null hypotheses, while {fii ^ 0} correspond to the false ones. Note that 



although (10) is not exactly a classical multifactor model because of the existence 
of dependence among K^-, - ■ ■ ,Kp,we can nevertheless show that (i^i, ■ ■ ■ , Kp)^ is a 
weakly dependent vector if the number of factors k is appropriately chosen. 

We now discuss how to choose k such that (f^i, ■ ■ ■ ,Kp)'^ is weakly dependent. 
Denote by aij the {i,j)th element in the covariance matrix A. If we have 

p-\^l+i + ■■■ + K)'^' ^ as p ^ oo, (11) 

then 

p"^ ^ \aij\ < p'^\\A\\f = p'^^iXl+i H h Aj)^/^ — ^ as p ^ cx), 

where the first inequality is by the Cauchy-Schwartz inequality. Note that Xlf=i ~ 



tr(S) = p, so that (11) is self-normalized. Therefore, by definition {Ki, ■ ■ ■ ,Kp) is 



weakly dependent. In practice, we always choose the smallest k such that 



Afc+i H h 



Ai + ■ ■ ■ + Ap 
holds for a predetermined small e, say, 0.01. 

Theorem 1. Suppose {Zi, ■ ■ ■ , Zp)^ ~ N{{fii, ■ ■ ■ , fip)'^ , S). Choose an appropriate 
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k such that 



Ai + • ■ ■ + A„ 



0{p-^) for 6>0. 



Let ^/Xjjj = {bij, ■ ■ ■ , hpjY /or j = 1, ■ ■ ■ , and {Wi, ■ ■ ■ , Wk)'^ ~ A^fc(0, 1^). Then, 



lim FDP(t) 

po—^oo 



^-^iSjirae null} 










^{ai{zt/2 + r]i + l^i)) + ^{ai{zt/2 - m 





:i2) 



where = (1 - ELi^^J"'/'. Vi = Yl=ihhWh, and $(■) and Zt,2 = ^-\t/2) are 
the cumulative distribution function and the t/2 lower quantile of a standard normal 
distribution, respectively. 



Note that condition (CO) implies that Ki 



Kp are weakly dependent random 



variables, but (11) converges to zero at some polynomial rate of p. 



The result of the asymptotic distribution of FDP(t) in Theorem 1 is new, compared 
with the current research in multiple testing for general dependence structure. To the 
best of our knowledge, it is the first result to fully capture the behavior of FDP(t) for 
high dimensional sparse problem, and the impact of dependence is explicitly spelled 
out. It is also closely connected with the existing results for independence case and 
weak dependence case. Let bih = for i 



1 , ■ ■ ■ ,p and h = I,-- - ,k in (10) and 
, Kp are weakly dependent or independent normal random variables, then 
it reduces to the weak dependence case or independence case, respectively. In the 



above two special cases, the numerator of (12) is just pQt. Storey (2002) used an 
estimate for po, resulting an estimator of FDP(t) as pot/R{t). This estimator has 
been shown to perform well under independency and weak dependency. However, 
for general dependency. Storey's procedure will not work well because it ignores the 



correlation effect among the test statistics, as shown by (12). Further discussions for 
the relationship between our result and the other leading research for multiple testing 
under dependence are shown in Section 3.4. 
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The results in Theorem 1 can be better understood by some special dependence 
structures as follows. These specific cases are also considered by Roquain & Villers 
(2010) and Friguet, Kloareg & Causeur (2009) under somewhat different setting, but 
results in Examples 1 and 2 are new. 

Example 1: [Equal Correlation] If S has pij = p G [0, 1) for i ^ j, then we 
can write 

Zi= + y/pW + a/1 - pKi i = !,■■■ ,p 

where W ~ A^(0, 1), i^'j ~ ^(0, 1), and W and all KiS are independent of each other. 
By Theorem 1, 



lim FDP(t) = 

po—^oo 



Po 


<^{d{Zt/2 + ^W)) + $(rf(^i/2 - ^W)) 






^{d{zt/2 + y/pW + Pi)) + ^{d{zt/2 - y/pW 





where d = {1 — p) 

Example 2: [Multifactor Model] Consider a multifactor model: 

Zi = pi + r]i + a'^Ki, i = 1, ■ ■ ■ , p, (13) 

where rji and are defined in Theorem 1 and Ki ~ A^(0, 1) for i = 1, ■ ■ ■ ,p. All the 
WhS and Ki's are independent of each other. In this model, Wi, ■ ■ ■ , Wk are the k 



common factors. By Theorem 1, expression (12) holds 



Although the distribution of FDP(t) in Example 2 is the same as that in Theorem 
1, the extension from Example 2 to Theorem 1 is technical. The key difference is that 
Example 2 assumes a multifactor model with independent random errors for the test 
statistics and Theorem 1 relaxes this restricted assumption largely to the arbitrary 
dependence structure. 

In Theorem 1, since FDP is bounded by 1, taking expectation on both sides of 



the equation (12) and by the Portmanteau lemma, we have the convergence of FDR: 
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Corollary 1. Under the assumptions in Theorem 1, for the high dimensional sparse 
case, 



lim FDR(t) = E 



T.i&{true null} \ ^{ai{Zt/2 + Vi)) + ^{ai{zt/2 - Vi)) 



Y7i=i \ ^{ai{zt/2 + r]i + + ^{ai{zt/2 - r]i - l^i)) 



(14) 



The expectation on the right hand side of ( 14 ) is with respect to standard multi- 
variate normal variables (IVi, ■ ■ ■ , WkY^ ~ ^^"^(0, 1^). 



The proof of Theorem 1 is based on the following result. 



Proposition 2. Under the assumptions in Theorem 1, 



p 

lim p~^R{t) =P~^y^\ ^{ai{zt/2 + rii + l^i)) + ^{ai{zt/2 - Vi - l^i)) 
lim Po V(t) = ^ V Wa.i{zti2 + r],)) + $(0,(2:4/2 - r],)) 

po^oo ^ — ^ L 

i£{true null} 



(15) 
(16) 



The proofs of Theorem 1 and Proposition 2 are shown in the Appendix. 



3.2 Estimating FDP 

In Theorem 1 and Proposition 2, the summation over the set of true null hypotheses 
is uncomputable, because it is not known which factor loadings correspond to the 
true nulls. However, due to the high dimensionality and sparsity, both p and po are 
large and pi is relatively small. Therefore, we can use 



r 1 

[$(ai(^V2 + Vi)) + '^{ai{zt/2 - Vi))\ (17) 

i=l 
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conservative surrogate for 



J2 \^iaiizt/2 + Vi)) + ^{ai{zt/2-Vi)) 

is {true null} 



Since only pi extra terms are included in (17), the substitution is accurate enough. 



The mean of V(t) is E J2ie{true nuU} — ^) = Po^, since the P-values corre- 
sponding to the true null hypotheses are uniformly distributed. However, the variance 
of V(t) can be large when the test statistics Zi, - ■ ■ , Zp are dependent. Owen (2005) 
has theoretically studied the variance of the number of false discoveries. In our frame- 



work, expression (17) is a function of i.i.d. standard normal variables. Given t, the 



variance of (17) can be obtained by simulations and hence variance of V{t) is approx- 



imated via (17). Relevant simulation studies will be presented in Section 5. 



In recent years, there have been substantial interests in the realized random vari- 
able FDP itself, instead of controlling FDR, as we are usually concerned about the 
number of false discoveries in a given experiment, rather than an average of FDP 
for hypothetical replications of the experiment. See Genovese & Wasserman (2004), 
Meinshausen (2005), Efron (2007), etc. In our problem, it is known that the approx- 
imate asymptotic distribution of V(t)/po is 



PO ^ [*(«*(^V2 + + '^{a^izt/2 - Vi)) ■ (19) 

1=1 



Let 



FDPA(t) = [Ha,{zt/2 + m)) + Ha,{zt/2-v^))])/R{t), 

i=l 

if R{t) 7^ and FDPA(t) = when R{t) = 0. Given observations Zi, - ■ ■ ,Zp of the 
test statistics Zi, - ■ ■ , Zp, if the unobserved but realized factors Wi, ■ ■ ■ , Wk can be 
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estimated by Wi, • • ■ , Wk, then we can obtain an estimator of FDPA(t) by 



FDP(t) = min ( ^ l^a^izt/^ + %)) + $(a.(^i/2 - , Rit)) I R{t) 



(20) 



1=1 



when R{t) ^ and FDP(t) = when R{t) = 0. Note that in (|20[, r/^ = ^^^^ khWh 
is an estimator for rji = X]ft=i ^ihWh- 

, Wk based on 



, Zp, we choose the smallest 75% of |zi|'s. For 



The following procedure is one practical way to estimate Wi, 
the data. For observed values zi, 
ease of notation, assume the first m ZiS have the smallest absolute values. Then 
approximately 

k 

Z, = Y,hhWh + K,, ^ = l,---,m. (21) 

h=l 



The approximation from (10) to (21) stems from the intuition that large |/ij|'s tend to 
produce large l^il's and the sparsity makes approximation errors negligible. Finally we 



apply the robust Li-regression to the equation set (21 ) and obtain the least-absolute 
deviation estimates Wi,--- ,Wk- The estimator (20) performs significantly better 



than Efron (2007) 's estimator in our simulation studies. One difference is that in our 
setting S is known. The other is that we give a better approximation as shown in 
Section 3.4. 



3.3 Asymptotic Justification 

Theorem 2 shows the asymptotic consistency of Li— regression estimators under model 



(21). Portnoy (1984b) has proven the asymptotic consistency for robust regression 
estimation when the random errors are i.i.d. However, his proof does not work here 
because of the weak dependence of random errors. Our result allows k to grow with 
m, even at a faster rate of o(m^/^) imposed by Portnoy (1984b). 
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Theorem 2. Suppose (21) is a correct model. Let w be the Li — regression estimator: 

m 

w = argmin^g^fe ^ |Zi - bf/3| (22) 



i=l 



where bj = {bn, ■ ■ ■ , bik)^ . Let w = {wi, • • • , Wk)"^ be the realized values of {Wh}h=i- 
Suppose k = 0{m'^) for < k, < 1 — 6 . Under the assumptions 

(CI) Y.U+i>^]<Vforv = 0{m^^), 

(C2) There exists a constant d > such that 

m 

lim sup m^^y I[\hl\i\ < d) = 0, 



'^^~||U|1 = 1 



i=l 



(C3) amax/omin < S for somc constant S when m — oo where l/oj is the standard 
deviation of K^, 

(C4) a^in = 0(m(i-«)/2)^ 
We have ||w — w||2 = Op{ 

(CI) is stronger than (CO) in Theorem 1 as (CO) only requires ^J^fc+i = 
Q^p2-25y (C2) ensures the identifiability of /3, which is similar to Proposition 3.3 in 
Portnoy (1984a). (C3) and (C4) are imposed to facilitate the technical proof. 

We now show in Theorem 3 the asymptotic consistency of FDP(t) based on 



Li— regression estimator of W = {Wi,--- , H^)"^ in model (21 ). The proof of Theorem 
3 is based on the result in Theorem 2. 

Theorem 3. // the assumptions in Theorem 2 are satisfied, and in addition, the 
following conditions are satisfied: 

(C5) R{t)/p > H for H >0 asp ^oo. 
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(C6) mini<j<pmin(|zt/2 + bfw|, \zt/2 - bfw|) > r > 0, 
then |FDP(t) - FDPA(t)| = Op(y^). 



In Theorem 3, (C6) is a reasonable condition because Zt/2 is a negative number 
when threshold t is very small and bj'w is a realization from a normal distribution 
=1 ^i/i) '^ith X]h=i ^j^/i < 1- Thus 2;t/2 + bf w or zt/2 — bf w is unlikely close to 
zero. Our proof shows a more general result. Under conditions of Theorem 3, 

|FDP(t) - FDPA(t)| = Op(||w - wlh) 



The results in Theorems 2-3 are based on the assumption that (21) is a correct 



model. In the following we will show that even if (21) is not a correct model, the 
effects of misspecification are negligible when p is sufficiently large. To facilitate 
the mathematical derivations, we instead consider the least-squares estimator. Sup- 



pose we are estimating W = {Wi, ■ ■ ■ , Wk)'^ from (10). Without loss of generahty, 
assume the true values of {VF/i}^^^^ are 0. Let X be the design matrix of model 
(10), then the least-squares estimator for W is Wj^g = (X"^X)^-'^X"^(^ + K), where 



^ = (/ii, ■ ■ ■ , /ip)"^ and K = {Ki, ■ ■ ■ , Kp)^ . Instead, we estimate Wi, ■ ■ ■ , Wk based 



on the simplified model (21) with m = p, which ignores sparse {yUj}, then the least- 
squares estimator for W is Wls = (X^X)^^X^K. The following result shows that 



the effect of misspecification in model (21) is negligible when p — t- oo: 
Theorem 4. The bias due to ignoring non-nulls is controlled by 

u 

1/2 



|Wls-w;s||2<||/^||2($^A71 

i=l 



In Theorem 1, we can choose appropriate k such that > 1 and Aj — > cxd as 
p — 7- oo for i < fc. Therefore, XliLi Aj^^ — ?■ as p — ?■ oo is a reasonable condition. 
When {niYl^^ are truly sparse, it is expected that ||a*||2 grows slowly or is even 
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bounded so that the bound in Theorem 4 is small. For Li— regression, it is expected 
to be even more robust to the outliers in the sparse vector {f^i^^—i- 



3.4 Relation with Other Methods 



Efron (2007) proposed a novel parametric model for V{t): 



V{t) = Pot 



1 + 2A 



'-Zt/2)<P{Zt/2 

V2t 



(23) 



where A ~ A^(0, a^) for some real number a and 0(-) stands for the probability density 
function of standard normal distribution. The correlation effect is explained by the 
dispersion variate A. His procedure is to estimate A from the data and use 



Pot 



1 + 2A 



'-Zt/2)<P{Zt/2 

^/2^ 



R{t) 



(24) 



as an estimator for FDP(t). Note that the above expressions are adaptations from 
his procedure for the one-sided test to our two-sided test setting. In his simulation, 
the above estimator captures the general trend of the FDP, but it is not accurate 
and deviates from the true FDP with large amount of noise. Consider our estimator 



FDP(t) in (20). Write % = a^Qi where Qi ~ A^(0, 1). When -> for \/i e 



{true null}, by the second order Taylor expansion 

Pot 



FDP(t) 



R{t) 



1+ E 

is {true null} 



-Zt/2)4'{.Zt/2 

Pot 



(25) 



By comparison with Efron's estimator, we can see that 



A 



V2po, 



(26) 



is {true null} 



Thus, our method is more general and more precise. 
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Leek & Storey (2008) considered a general framework for modeling the depen- 
dence in multiple testing. Their idea is to model the dependence via a factor model 
and reduces the multiple testing problem from dependence to independence case via 
accounting the effects of common factors. They also provided a method of estimating 
the common factors. In contrast, our problem is different from Leek & Storey's and 
we estimate common factors from very different methods. In addition, we provide 
the approximated FDP formula and its consistent estimate. 

Friguet, Kloareg & Causeur (2009) followed closely the framework of Leek & 
Storey (2008). They assumed that the data come directly from a multifactor model 
with independent random errors, and then used the EM algorithm to estimate all the 
parameters in the model and obtained an estimator for FDP(t). In particular, they 



subtract rji out of (13) based on their estiamte from the EM algorithm to improve the 
efficiency. However, the estimated number of factors in their studies is usually small 
by their EM algorithm, thus leading to inaccurate estimated FDP(t). Moreover, it is 
hard to derive theoretical results based on the estimator from their EM algorithm. 
Compared with their results, our procedure does not assume any specific dependence 
structure of the test statistics. What we do is to decompose the test statistics into an 
approximate factor model with weakly dependent errors, derive the factor loadings 
and estimate the unobserved but realized factors by Li-regression. Since the theo- 



retical distribution of V{t) is known, estimator (20) performs well based on a good 



estimation for Wi, ■ ■ ■ , Wk- See Section 5 for relevant simulation results. 



4 Approximate Control of FDR 

In this section we will propose some ideas that can asymptotically control the FDR, 
not the FDP, under arbitrary dependency. Although their validity is yet to be estab- 
lished, promising results reveal in the simulation studies. Therefore, they are worth 
some discussion and serve as a direction of our future work. 
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Suppose the number of false null hypotheses pi is known. If the signal /ij for 
i G {false null} is strong enough such that 



^(ai{zt/2 + r]i + l^iij +^(ai{zt/2 - Vi - l^i 
then asymptotically the FDR is approximately given by 

FDR(t) = E 



(27) 



El 


=1 


^{ai{zt/2 + r]i)) + ^{ai{zt/2 - Vi)) 






$ 


{ai{zt/2 + Tji)) + ^{ai{zt/2 - Vi)) 


+ Pi 



(2J 



which is the expectation of a function of Wi, ■ ■ ■ , Wk- Note that FDR(t) is a known 
function and can be computed by Monte Carlo simulation. For any predetermined 
error rate a, we can use the bisection method to solve t so that FDR()f:) = a. Since k 
is not large, the Monte Carlo computation is sufficiently fast for most applications. 



The requirement (27) is not very strong. First of all, $(3) ~ 0.9987, so (27) 



will hold if any number inside the $(■) is greater than 3. Secondly, 1 — Y2h=i ^ih 



IS 



usually very small. For example, if it is 0.01, then a, = (1 — X]h=i ^ih) ~ 
which means that if either zt/2 + f]i + f^i or zt/2 — f]i ~ f^i exceed 0.3, then (27) is 



approximately satisfied. Since the effect of sample size n is involved, (27) is not a 
very strong condition on the signal strength 



5 Simulation Studies 



In the simulation studies, we consider p = 2000, n = 100, a = 2, the number of false 
null hypotheses pi = 10 and the nonzero Pi = 1, unless stated otherwise. We will 
present 6 different dependence structures for S of the test statistics (Zi, ■ ■ ■ , Zp)'^ ~ 
N{{fii, ■ ■ ■ 1 ^p)^ 1 S). Following the setting in section 2, S is the correlation matrix 
of a random sample of size n of p— dimensional vector Xj = (Xji,--- ,Xjp), and 
/ij = y/nPjdj/a, j = 1, ■ ■ ■ ,p. The data generating process vector Xj's are as follows. 
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[Equal correlation] Let = [Xi, • • • , XpY ~ A^p(0, S) where S has diago- 
nal element 1 and off-diagonal element 1/2. 

[Fan & Song's model] For X = (Xi, • • • ,Xp), let {Xk}f^^ be i.i.d. A^(0, 1) 



where {^kYk^i^m are standard normally distributed. 

[Independent Cauchy] For X = (Xi, • • • ,Xp), let {Xk}f2i be i.i.d. Cauchy 
random variables with location parameter and scale parameter 1. 

[Three factor model] For X = (Xi, • • • , X^), let 



where W^'^^ ~ A^(-2, 1), W^^^) ^ ^(^^ ^-^^ ^{z) ^ ^^(4^ ^(1)^ ^(2)^ ^(3) 
1), and Hj are i.i.d. Ar(0, 1). 

[Two factor model] For X = (Xi, • • • , Xp), let 



where W^(^) and Vl^^^) ^re i.i.d. A^(0, 1), pf' and pf ^ are i.i.d. C/(-l, 1), and 
are i.i.d. iV(0,l). 

[Nonlineeir factor model] For X = [Xi, • • • , Xp), let 

X,- = sin{pfw^^^) + s^n(pf ) cxp(|pf + H,, 

where W^^^^ and l^^^^ are i.i.d. iV(0, 1), p^-^^ and pf^ are i.i.d. 1), and Hj 

are i.i.d. A^(0, 1). 



and 



Xfc = J]X;(-1)'+V5 + Wl--efe, fc = 1901,-- - ,2000, 




X^ = pWiy(^ + pf + pf + 



X,. = pj. V« + pf + 
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Fan & Song's Model has been considered in Fan & Song (2010) for high dimen- 
sional variable selection. This model is close to the independent case but has some 
special dependence structure. Note that although we have used the term "factor 
model" above to describe the dependence structure, it is not the factor model for the 
test statistics Zi,--- , Zp directly. Instead, we assume a factor model for the data 
Xi, ■ ■ ■ , Xp to construct some other dependence structures for the covariance matrix 
S. 

Convergence of FDP: Our result in Theorem 1 is based on asymptotic conver- 
gence. Without loss of generality, we consider a dependence structure based on the 
two factor model above. Let n = 100, pi = 10 and cr = 2. Let p vary from 100 to 
1000 and t be either 0.01 or 0.001. In Figure [T| we will show that the convergence is 
fast as p increases. Therefore, the asymptotic result in Theorem 1 should work very 
well when there are hundreds or thousands of hypotheses tested simultaneously. 

Variance of V{t): Variance of false discoveries in the correlated test statistics is 
usually large compared with that of the independent case, due to correlation struc- 
tures. In Table 2, for high dimensional sparse case, we compare the true variance 



of number of false discoveries, the variance of expression (18) (which is infeasible in 



practice) and the variance of expression (17) under 6 different dependence structures. 



It shows that the variance computed based on expression (17) approximately equals 
the variance of number of false discoveries. Therefore for high dimensional sparse 
case, we provide a fast and alternative method to estimate the variance of number of 
false discoveries in addition to the results in Owen (2005). 

Comparing methods of controlling FDR: Under different dependence struc- 



tures, we compare FDR values for our procedure PFA in equation (28) with pi 
known. Storey's procedure and Benjamini-Hochberg procedure. Table |3] shows that 
our method performs much better than Storey's procedure and Benjamini-Hochberg 
procedure, especially under strong dependence structures (rows 1, 4, 5, and 6), in 
terms of both mean and variance of the distribution of FDP. 
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p = 500 
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t = 0.01 



0.0 02 D.4 0.6 0.0 



p = 500 
t - 0.001 



0.0 02 D.4 0.6 O.B 



jj = 500 
( - 0.001 



0.0 D2 0.4 0.6 DO 



p = 1000 
t = 0.01 



DO 0:2 D.4 O.B DE 1.0 
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p - lOOO 

t = 0.001 
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p = 1000 
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□ 0.2 4 6 0.8 



Figure 1: Comparison for the distribution of the FDP with the hmiting distribution 
of the FDP, based on the two factor model over 10000 simulations. From the top row 
to the bottom, p = 100, 500, 1000 respectively. The first two columns correspond to 
t = 0.01 and the last two correspond to t = 0.001. The first and the third columns 
are for the FDP, while the second and the fourth are for the limit of the FDP. 



Table 2: Comparison for variance of number of false discoveries (column 2), variance of 



expression (18) (column 3) and variance of expression (17) (column 4) with t = 0.001 



based on 10000 simulations. 



Dependence Structure 


var(V(t)) 


var(V) 


vaiiy.up) 


Equal correlation 


180.9673 


178.5939 


180.6155 


Fan & Song's model 


5.2487 


5.2032 


5.2461 


Independent Cauchy 


9.0846 


8.8182 


8.9316 


Three factor model 


81.1915 


81.9373 


83.0818 


Two factor model 


53.9515 


53.6883 


54.0297 


Nonlinear factor model 


48.3414 


48.7013 


49.1645 



Estimate FDP: We now compare the estimated values of our method PFA (20) 



and Efron (2007) 's estimator with true values of false discovery proportion, under 6 
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Table 3: Comparison of FDR values for our method based on equation (28) (PFA) 
with Storey's procedure and Benjamini-Hochberg's procedure under six different 
dependence structures, where p = 2000, n = 200, t = 0.001, and /3j = 1 for 
i G {false null}. The computation is based on 10000 simulations and Monte Carlo 
errors are listed in the brackets. 





irue r usx 


r CIS. 


Storey 


JD-rl 


T7^ 1 1 J- • 

bquai correlation 


o.oTyo 


6.61% 


2.99%) 


3.90% 




c 070/ \ 

(i£).o i /o ) 


(-1 c 000/ \ 

(io.00/0 ) 


[iO.oo/o) 


(14.00/0 ) 


Fan & Song's model 


14.85% 


14.85% 


13.27% 


14.46% 




(11.76%) 


(11.58%) 


(11.21%) 


(13.46%) 


Independent Cauchy 


13.85% 


13.62% 


11.48% 


13.21% 




(13.60%) 


(13.15%) 


(12.39%) 


(15.40%) 


Three factor model 


8.08% 


8.29% 


4.00% 


5.46% 




(16.31%) 


(16.39%) 


(11.10%) 


(16.10%) 


Two factor model 


8.62% 


8.50% 


4.70% 


5.87% 




(16.44%) 


(16.27%) 


(11.97%) 


(16.55%) 


Nonlinear factor model 


6.63% 


6.81% 


3.20% 


4.19% 




(15.56%) 


(15.94%) 


(10.91%) 


(15.31%) 



different dependence structures. The results are depicted in Figure [2} Figure |3] and 
Table |4} Figure [2] shows that our estimated values correctly track the trends of FDP 
with smaller amount of noise. It also shows that both estimators tend to overestimate 
the true FDP, since FDPA(t) is an upper bound of the true FDP(t). They are close 
only when the number of false nulls pi is very small. In the current simulation setting, 
we choose pi = 50 compared with p = 1000, therefore, it is not a very sparse case. 
However, even under this case, our estimator still performs very well for six different 
dependence structures. Efron (2007) 's estimator is illustrated in Figure |2] with his 
suggestions for estimating parameters, which captures the general trend of true FDP 
but with large amount of noise. 
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False Discovery Proportion 




0.0 0.1 0.2 0.3 0.4 0.5 0.6 
False Discovery Proportion 



Three Factor Model 



Two Factor Model 



Nonlinear Factor Model 



0.0 0.1 0.2 0.3 0.4 0.5 0.6 
False Discovery Proportion 




0.0 0.1 0.2 0.3 0.4 0.5 0.6 
False Discovery Proportion 




0.0 0.1 0.2 0.3 0.4 0.5 0.6 
False Discovery Proportion 



Figure 2: Comparison of true values of False Discovery Proportion with estimated 
FDP by Efron (2007) 's procedure (crosses) and our PFA method (dots) under six 
different dependence structures, with p = 1000, pi = 50, n = 100, a = 2, t = 0.005 
and /Sj = 1 for i G {false null} based on 1000 simulations. The Z-statistics with 
absolute value less than or equal to Xq = 1 are used to estimate the dispersion variate 
A in Efron (2007) 's estimator. 



Table 4: Means and standard deviations of the relative error between true values of 
FDP and estimated FDP under the six dependence structures in Figure [2] REp is 
the relative error of our PFA estimator and REg is the relative error of Efron (2007) 's 

esti mator. RE is defined in Figure [3j 

mean(REp) SD(REp) mean(REE) SD(REe) 

Equal correlation 
Fan & Song's model 
Independent Cauchy 
Three factor model 
Two factor model 
Nonlinear factor model 



0.0342 


0.1579 


1.6132 


3.5320 


0.0685 


0.1801 


1.1549 


1.7180 


0.0611 


0.1685 


1.3086 


2.0900 


0.0456 


0.2072 


1.1953 


2.3070 


0.0428 


0.1625 


1.0658 


1.9039 


0.0548 


0.1815 


1.2446 


2.4927 



24 



Equal Correlation 



Fan & Song's Model 
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Two Factor Model 
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Relative Error (RE) 

Nonlinear Factor Model 
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I \ \ 1 1 
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Figure 3: Histograms of the relative error (RE) between true values of FDP and 
estimated FDP by our PFA method under the six dependence structures in Figure [2j 
RE is defined as (FDP(t) - FDP (t)) /FDP (t) if FDP(t) ^ and otherwise. 

6 Real Data Analysis 



Our proposed multiple testing procedures are now applied to the genome-wide asso- 
ciation studies, in particular the expression quantitative trait locus (eQTL) mapping. 
It is known that the expression levels of gene CCT8 are highly related to Down Syn- 
drome phenotypes. In our analysis, we use over two million SNP genotype data and 
CCT8 gene expression data for 210 individuals from three different populations, test- 
ing which SNPs are associated with the variation in CCT8 expression levels. To save 
space, we omit the description of the data pre-processing procedures. Interested read- 



ers can find more details from the websites: http: / /pngu. mgh.harvard.edu/~purcell/plink/res.shtml 
and ftp:/ /ftp. Sanger. ac.uk/pub/genevar/, and the paper Bradic, Fan & Wang (2010). 

We further introduce two sets of dummy variables (di, d2) to recode the SNP data, 
where di = ((ii,i, ■ ■ ■ , rfi,p) and d2 = {d2^i,--- ,(^2,^)5 representing three categories 
of polymorphisms, namely, {dij,d2j) = (0,0) for SNP^ = (no polymorphism). 
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(dij, (i2,j) = (I5O) for SNPj = 1 (one nucleotide has polymorphism) and {dij,d2j) = 
(0,1) for SNPj = 2 (both nucleotides have polymorphisms). Thus, instead of using 
model ([T|, we consider two marginal linear regression models between Y and dij: 

min E{Y - aij - Pi^jdijf, j = !,■■■ ,p (29) 

and between Y and d2j: 

min E(Y - a2,j - f32,jd2j f, j = !,■■■, p. (30) 

For ease of notation, we denote the recoded n x 2p dimensional design matrix as X. 
The missing SNP measurement are imputed as and the redundant SNP data are 
excluded. Finally, logarithm-transform of the raw CCT8 gene expression data are 
used. The details of our testing procedures are summarized as follows. 



To begin with, consider the full model Y = a + X/3 + e, where Y is the CCT8 
gene expression data, X is the n x 2p dimensional design matrix of the SNP 
codings and ~ A^(0, a^), i = 1, - ■ ■ ,n are the independent random errors. We 
adopt the refitted cross-validation (RCV) (Fan, Guo & Hao 2010) technique to 
estimate a by a, where LASSO is used in the first (variable selection) stage. 



Fit the marginal linear models (29) and (30) for each (recoded) SNP and obtain 
the least-squares estimate f3j for j = I,-- - ,2p. Compute the values of Z- 
statistics using formula ([s]), except that a is replaced by a. 

Calculate the P-values based on the Z-statistics and compute R(t) = i^{Pj : 
Pj < t} for a fixed threshold t. 

Apply eigenvalue decomposition to the covariance matrix S of the Z-statistics. 
Determine an appropriate number of factors k and derive the corresponding 
factor loading coefficients {bihY^zl^'fl^^ . 
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Figure 4: a of the three populations with respect to the selected model sizes, derived 
by using refitted cross-validation (RCV). 



Order the absolute-valued Z-statistics and choose the first m 



X 2p of 



them. Apply Li-regression to the equation set (21) and obtain its solution 
Wi, • • • , Wk- Plug them into (20) and get the estimated FDP(t). 



For each intermediate step of the above procedure, the outcomes are summarized 
in the following figures. Figure |4] illustrates the trend of the RCV-estimated standard 
deviation a with respect to different model sizes. Our result is similar to that in Fan, 
Guo & Hao (2010), in that although a is infiuenced by the selected model size, it is 
relatively stable and thus provides reasonable accuracy. The empirical distributions 
of the Z-values are presented in Figure [5} together with the fitted normal density 
curves. As pointed out in Efron (2007, 2010), due to the existence of dependency 
among the Z-values, their densities are either narrowed or widened and are not A^(0, 1) 
distributed. The histograms of the P- values are further provided in Figure |6j giving 
a crude estimate of the proportion of the false nulls for each of the three populations. 

The main results of our analysis are presented in Figures [7| which depicts the 
number of total discoveries R{t), the estimated number of false discoveries V{t) and 
the estimated False Discovery Proportion FDP(t) as functions of (the minus log^g" 
transformed) thresholding t for the three populations. As can be seen, in each case 
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Figure 5: Empirical distributions and fitted normal density curves of the Z- values 
for each of the three populations. Because of dependency, the Z- values are no longer 
iV(0, 1) distributed. The empirical distributions, instead, are A^(0.12, 1.22^) for CEU, 
A^(0.27, 1.392) for JPT and CHB, and iV(-0.04, 1.66^) for YRI, respectively The 
density curve for CEU is closest to A^(0, 1) and the least dispersed among the three. 
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Figure 6: Histograms of the P- values for each of the three populations. 

both R{t) and V{t) are decreasing when t decreases, but FDP(i) exhibits zigzag pat- 
terns and does not always decrease along with t, which results from the cluster effect 
of the P-values. A closer study of the outputs further shows that for all populations, 
the estimated FDP has a general trend of decreasing to the limit of around 0.1 to 
0.2, which backs up the intuition that a large proportion of the smallest P-values 
should correspond to the false nulls (true discoveries) when Z-statistics is very large; 
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however, in most other thresholding values, the estimated FDPs are at a high level. 
This is possibly due to small signal-to-noise ratios in eQTL studies. 

The results of the selected SNPs, together with the estimated FDPs, are depicted 
in Table [5] It is worth mentioning that Deutsch et al. (2005) and Bradic, Fan & Wang 
(2010) had also worked on the same CCT8 data to identify the significant SNPs in 
CEU population. Deutsch et al. (2005) performed association analysis for each SNP 
using ANOVA, while Bradic, Fan & Wang (2010) proposed the penalized composite 
quasi-likelihood variable selection method. Their findings were different as well, for 
the first group identified four SNPs (exactly the same as ours) which have the smallest 
P-values but the second group only discovered one SNP rs965951 among those four, 
arguing that the other three SNPs make little additional contributions conditioning 
on the presence of rs965951. Our results for CEU population coincide with that of 
the latter group, in the sense that the false discovery rate is high in our findings and 
our association study is marginal rather than joint modeling among several SNPs. 

Table 5: Information of the selected SNPs and the associated FDP for a particular 
threshold. Note that the density curve of the Z-values for CEU population is close 
to A^(0, 1), so the approximate FDP(t) equals pt/R{t) ~ 0.631. Therefore our high 
estimated FDP is reasonable. 



Population 


Threshold 


# Discoveries 


Estimated FDP 


Selected SNPs 


JPTCHB 


1.61 X 10-9 


5 


0.1535 


rs965951 rs2070611 
rs2832159 rs8133819 
rs2832160 


YRI 


1.14 X 10-9 


2 


0.2215 


rs9985076 rs965951 


CEU 


6.38 X 10-^ 


4 


0.8099 


rs965951 rs2832159 
rs8133819 rs2832160 



7 Discussion 

We have proposed a new method (principal factor approximation) for high dimen- 
sional multiple testing where the test statistics have an arbitrary dependence struc- 
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ture. For multivariate normal test statistics with a known covariance matrix, we 
can express the test statistics as an approximate factor model with weakly depen- 
dent random errors, by applying eigenvalue decomposition to the covariance matrix. 
We show the theoretical distribution of the false discovery proportion in large scale 
simultaneous tests when a common threshold is used for rejection. This result has 
important applications in controlling FDP and FDR. We also provide a procedure 
to estimate the realized FDP, which, in our simulation studies, correctly tracks the 
trend of FDP with smaller amount of noise. 

In the current paper, a fixed threshold is used for multiple testing under arbitrary 
dependency. Our future research interests will focus on how to take advantage of 
the dependence structure such that the testing procedure is more powerful or even 
optimal under arbitrary dependence structures. One possible way is to vary, according 
to the dependence structure, the threshold values for different hypotheses, based on 
successive conditioning. 



8 Appendix 

Lemma 1 is fundamental to our proof of Theorem 1 and Proposition 2. The result is 
known in probability, but has the formal statement and proof in Lyons (1988). 

Lemma 1 (Strong Law of Large Numbers for Weakly Correlated Variables). Let 

{Xn}'^=i be a sequence of real-valued random variables such that E\Xn\'^ < 1- // 

\Xn\<l a.S. and J2N>l^E\jf J^nKN^nl"^ < OO, then limN^oo ^J2n<N^n = a.s.. 

Proof of Proposition 2: Note that Pi = 2$(— |Zj|). Based on the expression of 

I{Pi < ■ ■ ■ 5 Wfc) r dependent random variables. 

J 1=1 

Nevertheless, we want to prove 

p p 
p-^^J(P, <t|Wi,-- - ,W^fc)^-^j9-^5^P(i^, <t|Wi,-- - ,W^fc) a.s.. (31) 

1=1 i=l 
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in (10 



Letting Xi = I{Pi <t\Wi,---, Wk) - P{Pi <t\Wi,--- , Wk), by Lemma 1 the con- 



clusion (31) is correct if we can show 



p 

Var (^p^i I{P, <t\Wi,-- - , Wk)) = Op{p-^) for some S > 0. 

i=l 

To begin with, note that 

p 

Var(j9-i^/(P, <t|l^i,--- ,Wk) 

i=l 

= p-2^Var(^J(P, <t|l^i,--- 

i=l 

+2p-^ CoY(^I{P,,<t\W,,--- ,Wk),I{P,<t\W: 



■ .Wk] 



l<i<j<p 



Since Var(/(Pj < t\Wi, ■ ■ ■ , Wk)) < |, the first term in the right-hand side of the last 
equation is Op(p~^). For the second term, the covariance is given by 

P{P^ <t,P,<t\W,,--- , Wk) -P{P^<t\W,,-- - , Wk)P{P, <t\Wu-- - , Wk). 

To simplify the notation, let p^j be the correlation between Ki and Kj. Without loss 
of generality, we assume p^j > (for pfj < 0, the calculation is similar). Denote by 

Cl,j = ai{-Zt/2 -Tji- Pi), C2,i = ai{Zt/2 -Tji- Pi). 

Then, from the joint normality, it can be shown that 



P{P^<t,P,<t\Wl,■■■ ,Wk) 

P{c2,i/a'i < Ki< Ci^i/ai,C2j/aj < Kj < cij/c 

{Pt,)'''z + C2, 
i^-pt,)'" 
{P%)"'^ + C2, 



X 



(f){z)dz. 



(32) 
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Next we will use Taylor expansion to analyze the joint probability further. We 
have shown that {Ki, ■ ■ ■ , Kp)^ ~ ^(0, A) are weakly dependent random variables. 
Let cov^j denote the covariance of and Kj, which is the {i,j)th element of the 
covariance matrix A. We also let b'lj = (1 - ELi^L)^^^(1 " ELi^i^)^^^- By the 
Holder inequality, 



P 



cov: 



k |l/2 



cov, 



k |2\l/4 
ij\ ) 



«J=1 



i=k+l 



11/4 



^ 



as p — 7- oo. For each $(■), we apply Taylor expansion with respect to (cofj^ )"'^/^, 



Therefore, we have (32) equals 



where we have used the fact that z(j)[z)dz = and /^(l — z'^)(f>{z)dz = 0. Now 
since P{Pi < ■ ■ ■ , Wk) = $(ci,i) - $(c2,i), we have 



Cov(^I{Pi<t\Wu--- ,Wk),I{Pj<t\Wi,--- ,Wk 

0(ci,i) - 0(c2,i)) (0(ci,j) - (t>{c2,j)^aiajC0Vij + o(coi;J). 



In the last line, ((/>(ci,j) — 0(c2,i)) (^(cij) — (I){c2j))aiaj is bounded by some constant 
except on a countable collection of measure zero sets. Let Cj be defined as the set 
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{zt/2 + 'ni + IJ'i = 0}U{zt/2-r]i- l^i = 0}. On the set Cf, (0(ci,i) - 0(c2,i))aj converges 
to zero as — >• oo. Therefore, (0(ci^j) — 0(c2,i)) (0(cij) — (f){c2j))aiaj is bounded by 
some constant on (Ui=i CiY- 

By the Cauchy-Schwartz inequahty and (CO) in Theorem 1, p~'^Yliij ~ 
0{p^^). On the set (IJiLi ^iY^ "^^ conclude that 

p 

i=l 

Hence by Lemma 1, 

p p 
p-i Y,m<t\W,,-- - , Wk) "-^ p-' Yl P^P^ <AWu--- . Wk) a.s.. 

i=l i=l 

Therefore, 



p p 




i=l 



With the same argument we can also show 

lim PoW{t) = po ^ Y] Wiai{zt/2 + Vi)) + ^(ai(^t/2 - Vi))] 

po^oo — ' L J 
i6{true null} 

for the high dimensional sparse case. The proof of Proposition 2 is now complete. 
Proof of Theorem 1: 
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For all bounded continuous functions /, 



lim E 



m 
m 



= E 
= E 
= E 



{po/p)ivit)/poy 

Po-s-oo R{t)/p 



f lim 



f 



(po/p)(Ei 



jG{truc null} 



P{P^<t\W^,■■■ ,Wk)/po) 



EllPiP^<t\W,r■■ ,Wk)/p 

P{P^<t\Wl,■■■ ,Wk] 



J, / 5^i£{true null} 



Ei=lPiP^<tm,■■■ ,Wk) 



In the second equality, we have used the convergence results in Proposition 2 and 
the Continuous Mapping Theorem. We have also used the Slutsky's Theorem, since, 
given Wi, ■ ■ ■ , Wk, R{t)/p converges in distribution to a constant Si=i P{Pi ^ 
t\Wi, • • • , Wk). Finally, by the Portmanteau's Lemma, 



V(t) D Sie (true null} 

lim — — = 

po^oo R[t) 



P{Pi <t\Wi 



T.UP{P^<Aw,. 



Wk 



i6{true null} 



The proof of Theorem 1 is complete. 

Proof of Theorem 2: Without loss of generality, we assume that the true value of 
w is zero, and we need to prove ||w||2 — Op{J ^). Let L : R^ R^ be defined by 



m 

Lj{w) = m"^ ^ bijSgn{Ki - hjw) 

i=l 

where sgn{x) is the sign function of x and equals zero when a; = 0. Then we want to 
prove that there is a root w of the equation L(w) = satisfying ||w||| = Op{k/m). 
We will apply Result 6.3.4 of Ortega and Rheinboldt (1970, page 163), so it suffices 
to show that with high probability, w^L(w) < with ||w||| = Bk/m for a sufficiently 
large constant B. 



Let V = w^L(w) = J2T=i where Vi = {hJw)sgn{Ki - hjw). By Cheby- 
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shev's inequality, P{V < E{V) + h x SD{V)) > 1 — h Therefore, to prove the 
resuh in Theorem 2, we want to derive the upper bounds for E{V) and SD(^) and 
show that \fh > 0, 3B and M s.t. Vm > M, P{V < 0) > 1 - h'^. 

We will first present a result from Polya (1945), which will be very useful for our 
proof. For x > 0, 



1 



1 + \l 1 — exp( x^) 



il + 5{x)) with sup|5(a;)| < 0.004. (33) 

x>0 



The variance of V is shown as follows: 



Vai{V) = J2 Var(Vi) + m'^ ^ Cov{Vi, Vj). 



i=l 



Write w = su with ||u||2 = 1 where s = {Bk/mf^. By (C2), (C3) and (C4) in 
Theorem 2, for sufficiently large m, 

mm m 

^ Var(V^i) = ^/(|bfu| < d)Vai{Vi) + ^/(|bfu| > d)Nai{Vi) 

i=l 1=1 
m 

= [^/(|bfu|>rf)Var(\/,)](l + o(l)), (34) 



i=l 



and 



^/(|bfu| < d)l{\hju\ < d)CoY{Vi,Vj] 



+2^/(|bfu| < d)l{\hju\ > d)Cov{V^,Vj 



■^/(|bfu| > rf)/(|bju| > d)GoN{V,,Vj^ 



[5^/(|bfu| >d)/(|bju| >d)Cov(\/,,F,)](l + o(l)). (35) 



We will prove (34) and (35) in detail at the end of proof for Theorem 2. 
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For each pair of Vi and Vj, it is easy to show that 



Cov(\/„ V,) = 4(bfw)(bjw) \P{K, < hjw, K, < bjw) - $(aibf w)$(aibjw) 
The above formula includes the Var(l^) as a specific case. 



By Polya's approximation (33), 

VaiiVi) = {hfwfexp I - ^(aibfw)2|(l + 6j) with \6j\ < 0.004. (36) 

Hence 

^/(|bfu| > c/)Var(\/i) < ^ exp | - -(a,c/s)2| (1 + (5^) 

i=l j=l 

< 2ms^exp| {ammdsY^. 

To compute Cov(Vi, Vj), we have 

P{Ki < hjw, Kj < bjw) 

^ (i-iP^i)v^ )K (i-iP^i)v^ )^^^^'^ 

= $(ajbfw)$(ajbjw) + 0(ajbf w)0(ajbJ'w)aiajCOffj(l + o(l)), 
where 5^^ = 1 if pf^ > and —1 otherwise. Therefore, 

Cov(V;, Vj) = 4(bfw)(bjw)(/.(aibfw)0(ajbjw)aia,co4(l + o(l)), (37) 

and 

l^/dbful >ci)/(|bju| >ci)Cov(\/„\/,)l 
< ^s^exp I - (araindsY^al^^Jcov^jKl + o(l)). 
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Consequently, we have 



2 r 2 1 r 1 \ \ 

* j 



m 



cov 



TT 



We apply (CI) in Theorem 2 and the Cauchy-Schwartz inequality to get ^ ^ . cof < 
(X^iLfc+i '^D^''^ — '^^^'^ ■• conclude that the standard deviation of V is bounded by 



V^sm ^/^exp I - ^ (amines) ^ I amax(?7)^''^- 



In the derivations above, we used the fact that bj"u < ||bj||2 < 1, and the covariance 
matrix for Ki in (21) of the paper is a submatrix for covariance matrix of Ki in (10). 

Next we will show that E{y^ is bounded from above by a negative constant. Using 
x($(a;) — |) > 0, we have 



2 r li 



i=l 



> — $:/(|bru|>d)ka,ds)-i 

i=l 



2rfs 
m 



[$(aic/s) - - 

i=l 



r, 7 1 

— yj(|bfu| <rf) $(a,rfs)-- 
m L 2 



By (C2) in Theorem 2, ^ Xlili -^(|bfu| < c?) — )■ 0, so for sufficiently large m, we have 



i5(V)>-V[$(a.rfs) 

i=l 



ll 

2 



An application of (33) to the right hand side of the last line leads to 



-EiV) > ^ V - a/i - exp { - t^a^^dsf). 

m ^-^ 2 V TT ^ 
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Note that 



SO we have 

^2 nr . ^ ^ 



d? p2 f 1 

-E{y) > ysM-aminexp| - — (ammC^s) 



To show that V/i > 0, 35 and M s.t. Vm > M, P(V < 0) > 1 - /i^^ by 
Chebyshev's inequahty and the upper bounds derived above, it is sufficient to show 
that 



'y'^^minexp I - -^{amindsf^ > hsm exp | - ^amindsf^a^.^^rj^^'^. 

Recall s = (Bk/mY^'^, after some algebra, this is equivalent to show 

d'iBkY^^rry'/^expl — ia^i^dsy} > 2^/^^. 

By (C3), then for all h > 0, when B satisfies d'^{Bky/^{n)-^/'^ > 2hr]^/^S, we have 
P{V < 0) > 1 - h-^. Note that k = 0{m'') and r] = 0{m^''), so k-^^^r]^/^ = 0(1). 



To complete the proof of Theorem 2, we only need to show that (34) and (35) are 
correct. 



To prove (34), by (36) we have 



^/(|bfu| < d)V8iT{Vi) < ^/(|bfu| < d)s^d^ 



and 



i=l i=l 



mm 2 



_ 2 2 



^/(|bfu| > d)VaT{Vi) > ^/(|bfu| > d)s^d^exp 

1=1 i=l 

Recall s = {Bk/mf^, then by (C3) and (C4), exp jfaLx^^} = ^(1). Therefore, by 
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(C2) we have 



Y:tiI{\hJn\<d)V^T{V,) 
Er=i/(|bfu|>rf)Var(\/,) 



— )■ as m — )■ cxD, 



so (34) is correct. With the same argument and by (37), we can show that (35) is 



also correct. The proof of Theorem 2 is now complete. 



Proof of Theorem 3: Letting 



^1 = £ [*(ai(-2t/2 + bfw)) - $(a,(zt/2 + bfw)) 

i=l 

A2 = \ ^iai{zt/2 - bfw)) - $(ai(zt/2 - hjw)) 



i=l 



and 



we have 



FDP(t) - FDPA(t) = (Ai + A2)/i?(t). 



Consider Ai = Yl^i=i By the mean value theorem, there exists C,i in the interval of 



(bj w, w), such that Aij = (t){ai{zt/2 + 6))'^ibj (w — w) where is the standard 
normal density function. 

Next we will show that (f){ai{zt/2 + ^i))o,i is bounded by a constant. Without loss 
of generality, we discuss about the case in (C6) when zt/2 + bf w < — r. By Theorem 
2, we can choose sufficiently large m such that Zt/2 + 6 < — t/2. For the function 
g{a) = exp(— a^x^/8)a, g{a) is maximized when a = 2/x. Therefore, 



2TT(j){ai{zt/2 + ^i))ai < aiexp(-ajr /8) < 2exp(-l/2)/r. 



For Zt/2 + bf w > r we have the same result. In both cases, we can use a constant D 
such that (f){ai{zt/2 + ^i))o,i < D. 



By the Cauchy-Schwartz inequality, we have Y7i=i \^ih\ < {p Y7i=i ^IhY^"^ — {p^hY^"^ 
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Therefore, by the Cauchy-Schwartz inequahty and the fact that Y^h=i < P, we have 



p k 

I All < D^^'^\bih\\wh-Wh\ 

1=1 h=l 

k 

< Dj2ip>^hy^^\wh-wh\ 



h=l 



k 



< 



h=l h=l 
< Dp||w — w||2. 



1/2 



By Theorem 1, ||w- w||2 = Op{^^). By (C5) in Theorem 3, R{t)/p > H ioi H > 
when p — >■ oo. Therefore, \Ai/R{t)\ — Op{^J^). For A2, the result is the same. The 
proof of Theorem 3 is now complete. 

Proof of Theorem 4: Note that ||Wls - Wls||2 = ||(X^X)-iX^/u||2. By the 
definition of X, we have X^X = A, where A = diag(Ai, • • • , \k). Therefore, by the 
Cauchy-Schwartz inequality. 



|w.-w-,,^[E(^)t'^wb(E^) 

i=i * i=i ' 



1 \ 1/2 



The proof is complete. 
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Figure 7: Number of total discoveries, estimated number of false discoveries and esti- 
mated False Discovery Proportion as functions of thresholding t for CEU population 
(row 1), JPT and CHB (row 2) and YRI (row 3). The x-coordinate is — logt, the 
minus logj^o'^ransformed thresholding. 
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